On dispose des données de vente journalière de chez BND. L’extraction contient l’historique de ventes entre le premier trimestre 2016 jusqu’à la 3 semaine de janvier 2018.
Les ventes sont groupées au niveau P2 où tous les produits sont pris en compte séparément et en incluant les ventes promotionnelles. Le groupement des ventes par client est fait au niveau C4 où les résultats sont sommés par distributeur.
product_raw_df.head()
| Product | Client | 04/01/2016 | 05/01/2016 | 06/01/2016 | 07/01/2016 | 08/01/2016 | 09/01/2016 | 10/01/2016 | 11/01/2016 | ... | 14/01/2018 | 15/01/2018 | 16/01/2018 | 17/01/2018 | 18/01/2018 | 19/01/2018 | 20/01/2018 | 21/01/2018 | 22/01/2018 | 23/01/2018 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GBA001AUC180FS | 68L041 | 0.00 | 2.00 | 0.00 | 2.00 | 0.00 | 3.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | GBA001AUC180FS | 68L042 | 0.00 | 1.00 | 2.00 | 1.00 | 0.00 | 2.00 | 0.00 | 1.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | GBA001BND060FS | 68C011 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3 | GBA001BND060FS | 68C120 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 4 | GBA001BND060FS | 68C123 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5 rows × 753 columns
La fluctuation des ventes journalières de certains produits étant très importantes, nous allons considérer les ventes sur la base des historiques hebdomadaires.
weekly_product_df = product_df.resample('W',axis=1).mean()
product_df.head()
| 2016-01-10 00:00:00 | 2016-01-17 00:00:00 | 2016-01-24 00:00:00 | 2016-01-31 00:00:00 | 2016-02-07 00:00:00 | 2016-02-14 00:00:00 | 2016-02-21 00:00:00 | 2016-02-28 00:00:00 | 2016-03-06 00:00:00 | 2016-03-13 00:00:00 | ... | 2017-11-26 00:00:00 | 2017-12-03 00:00:00 | 2017-12-10 00:00:00 | 2017-12-17 00:00:00 | 2017-12-24 00:00:00 | 2017-12-31 00:00:00 | 2018-01-07 00:00:00 | 2018-01-14 00:00:00 | 2018-01-21 00:00:00 | 2018-01-28 00:00:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.00 | 1.14 | 1.29 | 1.00 | 0.71 | 1.43 | 1.86 | 1.57 | 1.00 | 1.57 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | 0.86 | 1.43 | 1.57 | 1.57 | 1.71 | 1.29 | 1.71 | 1.29 | 1.00 | 0.57 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 2.29 | 1.29 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 4 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5 rows × 108 columns
product_df.head()
The last 8 values (complete zeros) of each series have been dropped The first 0 values (complete zeros) of each series have been dropped
| 2016-01-10 00:00:00 | 2016-01-17 00:00:00 | 2016-01-24 00:00:00 | 2016-01-31 00:00:00 | 2016-02-07 00:00:00 | 2016-02-14 00:00:00 | 2016-02-21 00:00:00 | 2016-02-28 00:00:00 | 2016-03-06 00:00:00 | 2016-03-13 00:00:00 | ... | 2017-10-01 00:00:00 | 2017-10-08 00:00:00 | 2017-10-15 00:00:00 | 2017-10-22 00:00:00 | 2017-10-29 00:00:00 | 2017-11-05 00:00:00 | 2017-11-12 00:00:00 | 2017-11-19 00:00:00 | 2017-11-26 00:00:00 | 2017-12-03 00:00:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.00 | 1.14 | 1.29 | 1.00 | 0.71 | 1.43 | 1.86 | 1.57 | 1.00 | 1.57 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | 0.86 | 1.43 | 1.57 | 1.57 | 1.71 | 1.29 | 1.71 | 1.29 | 1.00 | 0.57 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 1.14 | 1.57 | 1.14 | 0.86 | 1.29 | 1.14 | 0.29 | 0.00 | 0.00 | 0.00 |
| 3 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 1.57 | 2.29 | 2.00 | 2.14 | 2.00 | 2.43 | 2.86 | 1.14 | 2.29 | 1.29 |
| 4 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
5 rows × 100 columns
Remove the products that werent sold in the last 15 weeks
t = 15
Series With 15 trailing zeros are removed Removed: 4327 , Remaining: 4319
Remove the products which werent sold in the 15 first weeks
t = 15
Series With more than 15 zeros are removed Removed: 1472 , Remaining: 2847
product_df_clean = product_df
print(product_df.shape)
product_df.head()
(2847, 100)
| 2016-01-10 00:00:00 | 2016-01-17 00:00:00 | 2016-01-24 00:00:00 | 2016-01-31 00:00:00 | 2016-02-07 00:00:00 | 2016-02-14 00:00:00 | 2016-02-21 00:00:00 | 2016-02-28 00:00:00 | 2016-03-06 00:00:00 | 2016-03-13 00:00:00 | ... | 2017-10-01 00:00:00 | 2017-10-08 00:00:00 | 2017-10-15 00:00:00 | 2017-10-22 00:00:00 | 2017-10-29 00:00:00 | 2017-11-05 00:00:00 | 2017-11-12 00:00:00 | 2017-11-19 00:00:00 | 2017-11-26 00:00:00 | 2017-12-03 00:00:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37 | 4.71 | 4.29 | 4.71 | 5.14 | 4.14 | 4.57 | 4.71 | 5.00 | 4.00 | 4.57 | ... | 2.29 | 1.71 | 2.14 | 1.86 | 2.29 | 2.14 | 1.71 | 3.00 | 2.57 | 0.00 |
| 39 | 8.57 | 13.71 | 11.86 | 6.29 | 6.29 | 6.14 | 6.29 | 5.14 | 5.29 | 6.14 | ... | 7.86 | 9.71 | 19.29 | 17.29 | 9.00 | 8.00 | 7.14 | 0.00 | 0.00 | 0.00 |
| 40 | 3.71 | 0.86 | 0.43 | 0.14 | 0.29 | 0.29 | 1.00 | 0.43 | 2.00 | 4.14 | ... | 1.29 | 1.29 | 1.29 | 1.29 | 1.00 | 1.29 | 1.00 | 0.00 | 0.00 | 0.00 |
| 44 | 1.14 | 1.29 | 0.86 | 1.43 | 2.86 | 1.29 | 1.57 | 1.86 | 1.86 | 1.71 | ... | 0.29 | 0.14 | 0.29 | 0.43 | 0.71 | 0.29 | 0.71 | 0.00 | 0.00 | 0.00 |
| 45 | 1.14 | 0.71 | 0.86 | 1.14 | 1.14 | 1.14 | 1.43 | 0.86 | 1.14 | 0.86 | ... | 0.86 | 1.29 | 1.29 | 1.29 | 1.00 | 1.00 | 0.57 | 0.00 | 0.00 | 0.00 |
5 rows × 100 columns
Apply a rolling average filter using a window of size = 4 , then remove rare sales
window = 4
(2847, 97)
Winsorizing is to set all outliers to a specified percentile of the data; for example, a 90% winsorization would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile
t = 4
C:\Users\rahmim00\AppData\Local\Continuum\anaconda3\envs\dev_py34\lib\site-packages\scipy\stats\stats.py:2245: RuntimeWarning: invalid value encountered in true_divide np.expand_dims(sstd, axis=axis)) C:\Users\rahmim00\AppData\Local\Continuum\anaconda3\envs\dev_py34\lib\site-packages\ipykernel_launcher.py:6: RuntimeWarning: invalid value encountered in greater
Remove the products that werent sold for at least 5 weeks
t = 5
Series With less than 5 values are removed Removed: 1 , Remaining: 2846
We end up with 2846 product/client . Apply a Z-normalization (subtract the mean and divide by the standard deviation) in order to get a scale-independent data
product_df_full.head()
| Product | Client | 2016-01-24 00:00:00 | 2016-01-31 00:00:00 | 2016-02-07 00:00:00 | 2016-02-14 00:00:00 | 2016-02-21 00:00:00 | 2016-02-28 00:00:00 | 2016-03-06 00:00:00 | 2016-03-13 00:00:00 | ... | 2017-09-24 00:00:00 | 2017-10-01 00:00:00 | 2017-10-08 00:00:00 | 2017-10-15 00:00:00 | 2017-10-22 00:00:00 | 2017-10-29 00:00:00 | 2017-11-05 00:00:00 | 2017-11-12 00:00:00 | 2017-11-19 00:00:00 | 2017-11-26 00:00:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 37 | GBA001BND060SS | 68A139 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | ... | -1.00 | -1.26 | -1.26 | -1.26 | -1.26 | -1.17 | -1.26 | -0.96 | -0.88 | -1.26 |
| 39 | GBA001BND060SS | 68C120 | 0.51 | 0.30 | -0.40 | -0.91 | -1.01 | -1.10 | -1.10 | -0.55 | ... | 1.08 | 0.06 | 0.99 | 1.76 | 1.80 | 1.71 | 0.60 | -0.98 | -1.26 | -1.26 |
| 40 | GBA001BND060SS | 68C122 | -0.15 | -1.04 | -1.04 | -1.04 | -0.96 | -0.52 | 0.48 | 1.66 | ... | -0.45 | -0.33 | -0.22 | -0.15 | -0.22 | -0.22 | -0.30 | -0.63 | -0.89 | -1.04 |
| 44 | GBA001BND060SS | 68C126 | -0.11 | 0.65 | 0.65 | 0.97 | 1.16 | 0.71 | 0.90 | 1.03 | ... | -1.37 | -1.43 | -1.43 | -1.43 | -1.43 | -1.43 | -1.24 | -1.43 | -1.43 | -1.43 |
| 45 | GBA001BND060SS | 68C129 | -0.35 | -0.35 | 0.33 | 1.25 | 0.79 | 0.79 | 0.33 | -0.35 | ... | 0.56 | 1.25 | 0.79 | 1.02 | 1.25 | 0.79 | -0.35 | -1.95 | -1.95 | -1.95 |
5 rows × 99 columns
hlp.Clusters_plot(X= X_pca, labels = np.zeros(len(X_pca)))
[ 0.192122 0.13360553 0.0868289 0.07429497 0.04438952]
Affiche une tendance croissante dans le temps.
plt.plot(range(nb_col), princ_axis [1,:])
Oppose les ventes en terme de saisonnalité. Les produits corrélés avec cet axe ont tendance à être vendu durant les périodes d’Automne ou début d’été.
plt.plot(range(nb_col), princ_axis [2,:])
corrSamples = hlp.GetMostCorrelatedTo(X_pca,component,index=product_df.index)
Produit 1744
Try out Hierarchical clustering, kMeans and kMedodis on raw (cleaned) data. Then, do a PCA plot to visualize the result of the clustering on the principal components
product_df_full.head()
| Product | Client | 2016-01-24 00:00:00 | 2016-01-31 00:00:00 | 2016-02-07 00:00:00 | 2016-02-14 00:00:00 | 2016-02-21 00:00:00 | 2016-02-28 00:00:00 | 2016-03-06 00:00:00 | 2016-03-13 00:00:00 | ... | 2017-09-24 00:00:00 | 2017-10-01 00:00:00 | 2017-10-08 00:00:00 | 2017-10-15 00:00:00 | 2017-10-22 00:00:00 | 2017-10-29 00:00:00 | 2017-11-05 00:00:00 | 2017-11-12 00:00:00 | 2017-11-19 00:00:00 | 2017-11-26 00:00:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GBA001BND060SS | 68A139 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | ... | -1.00 | -1.26 | -1.26 | -1.26 | -1.26 | -1.17 | -1.26 | -0.96 | -0.88 | -1.26 |
| 1 | GBA001BND060SS | 68C120 | 0.51 | 0.30 | -0.40 | -0.91 | -1.01 | -1.10 | -1.10 | -0.55 | ... | 1.08 | 0.06 | 0.99 | 1.76 | 1.80 | 1.71 | 0.60 | -0.98 | -1.26 | -1.26 |
| 2 | GBA001BND060SS | 68C122 | -0.15 | -1.04 | -1.04 | -1.04 | -0.96 | -0.52 | 0.48 | 1.66 | ... | -0.45 | -0.33 | -0.22 | -0.15 | -0.22 | -0.22 | -0.30 | -0.63 | -0.89 | -1.04 |
| 3 | GBA001BND060SS | 68C126 | -0.11 | 0.65 | 0.65 | 0.97 | 1.16 | 0.71 | 0.90 | 1.03 | ... | -1.37 | -1.43 | -1.43 | -1.43 | -1.43 | -1.43 | -1.24 | -1.43 | -1.43 | -1.43 |
| 4 | GBA001BND060SS | 68C129 | -0.35 | -0.35 | 0.33 | 1.25 | 0.79 | 0.79 | 0.33 | -0.35 | ... | 0.56 | 1.25 | 0.79 | 1.02 | 1.25 | 0.79 | -0.35 | -1.95 | -1.95 | -1.95 |
5 rows × 99 columns
SSE: La somme des carrés est une mesure de variation ou d'écart par rapport à la moyenne. Elle représente la somme des carrés des différences par rapport au centre du cluster.
Silhouette Score: Permet d'évaluer si chaque point appartient au « bon » cluster : est-il proche des points du cluster auquel il appartient ? Est-il loin des autres points ?
def getSSE(samples,labels):
return np.sum( (samples-labels)**2)
def getSilouhaite(samples,labels):
return metrics.silhouette_score(samples,labels)
Le principe est de rassembler les individus qui affichent des courbes similaires (en distance euclidienne ou en corrélation). Les distances sont calculées entre les produits deux à deux. Plus deux observations seront dissemblables, plus la distance sera importante. La CAH va ensuite rassembler les individus de manière itérative afin de produire un dendrogramme ou arbre de classification
hierarchy.linkage(X_z, method='complete',metric='euclidean')
hierarchy.dendrogram(Z, truncate_mode='lastp', p=100, leaf_rotation=90., leaf_font_size=7., show_contracted=True)
plot_inertie()
La méthode de Ward consiste à regrouper les classes de façon que l'augmentation de l'inertie interclasse soit maximum.
n_cluster = 116
ward = AgglomerativeClustering(n_clusters=n_cluster, linkage='ward').fit(X_z)
Used to perform hierarchical clustering over particularly large data-sets.The advantage of using BIRCH algorithm is its ability to incrementally & dynamically cluster incoming.
labels_birch = Birch(n_clusters= n_cluster, threshold=0.5, compute_labels=True).fit_predict(X_z)
Les cartes auto-organisatrices sont constituées d'une grille. Dans chaque nœud de la grille se trouve un « neurone ». Chaque neurone est lié à un vecteur référent, responsable d'une zone dans l'espace des données (Un certain nombre de courbes similaires).
som = sompy.SOMFactory().build(X_z, mapsize, mask=None, mapshape='rectangular', lattice='rect', normalization='var', initialization='pca', neighborhood='gaussian', training='batch', name='sompy')
maxtrainlen %d inf maxtrainlen %d inf Topographic error = 0.153900210822; Quantization error = 6.2435798326
The main idea is to define k centers, one for each cluster. then take each point belonging to a given data set and associate it to the nearest center. When no point is pending, re-calculate k new centroids as barycenter of the clusters. A new binding has to be done between the same data set points and the nearest new center. repeat until no more changes are done ie centers do not move any more.
plot_inertia()
plot_silhouaite()
kmeans = KMeans(n_clusters=92).fit(X_z)
The k-medoids algorithm is a clustering algorithm related to the k-means algorithm. K-means attempts to minimize the total squared error, while k-medoids minimizes the sum of dissimilarities between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k-means algorithm, k-medoids chooses datapoints as centers ( medoids or exemplars).
corr_distance = squareform(pdist(X_z, 'correlation')) #Pearsons
euclid_distance = squareform(pdist(X_z, 'euclidean'))
plot_inertia()
plot_silhouaite()
label, medoids_euc = kMedoids.cluster(euclid_distance,k= 113)
df.nlargest(7,"SSE")
| SSE | |
|---|---|
| Agg_complete | 521,110.11 |
| Ward | 510,172.34 |
| Birch | 504,761.40 |
| kMeans | 500,011.45 |
| kMedoids_corr | 178,083.81 |
| kMedoids | 165,892.85 |
disp = hlp.Cluster_series_plot(data_df = product_df_full, cluster_df = eucl_df,headers = row_headers)
In this section we'll build forecasts on a cluster level using classical Holt-Winter's models. Our goal is to evaluate the quality of the prediction on individual series composing the each cluster after performing a weighted split.
The evaluation metric used is the MASE (mean absolute scaled error) which is a scale-independant metric calculated wrt the MAE
group_series.head()
| 2016-01-10 00:00:00 | 2016-01-17 00:00:00 | 2016-01-24 00:00:00 | 2016-01-31 00:00:00 | 2016-02-07 00:00:00 | 2016-02-14 00:00:00 | 2016-02-21 00:00:00 | 2016-02-28 00:00:00 | 2016-03-06 00:00:00 | 2016-03-13 00:00:00 | ... | 2017-10-01 00:00:00 | 2017-10-08 00:00:00 | 2017-10-15 00:00:00 | 2017-10-22 00:00:00 | 2017-10-29 00:00:00 | 2017-11-05 00:00:00 | 2017-11-12 00:00:00 | 2017-11-19 00:00:00 | 2017-11-26 00:00:00 | 2017-12-03 00:00:00 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cluster | |||||||||||||||||||||
| 1 | 401 | 467 | 422 | 520 | 466 | 219 | 253 | 286 | 283 | 506 | ... | 350 | 422 | 224 | 232 | 179 | 193 | 192 | 107 | 134 | 19 |
| 2 | 97 | 103 | 66 | 36 | 37 | 36 | 46 | 296 | 379 | 109 | ... | 37 | 130 | 94 | 27 | 15 | 13 | 10 | 10 | 11 | 2 |
| 3 | 371 | 276 | 119 | 507 | 515 | 148 | 128 | 128 | 101 | 523 | ... | 498 | 466 | 476 | 170 | 122 | 116 | 743 | 183 | 168 | 21 |
| 4 | 1076 | 1185 | 491 | 278 | 251 | 1624 | 1366 | 451 | 1354 | 1260 | ... | 173 | 139 | 703 | 678 | 219 | 154 | 161 | 125 | 268 | 84 |
| 5 | 281 | 286 | 256 | 298 | 298 | 338 | 290 | 322 | 324 | 332 | ... | 393 | 311 | 328 | 429 | 311 | 288 | 249 | 143 | 211 | 34 |
5 rows × 100 columns
forecast = pd.DataFrame(forecasts,columns = group_series.columns[-fc:],index = group_series.index)
display(mase_variation.describe())
| Agg | Split | Forecast | Increase | Bias | |
|---|---|---|---|---|---|
| count | 113.00 | 113.00 | 113.00 | 113.00 | 113.00 |
| mean | 1.28 | 1.24 | 1.87 | 7.92 | -27.62 |
| std | 0.69 | 0.73 | 1.32 | 52.28 | 22.98 |
| min | 0.25 | 0.46 | 0.71 | -61.35 | -81.73 |
| 25% | 0.88 | 0.84 | 1.14 | -22.37 | -39.27 |
| 50% | 1.09 | 1.05 | 1.46 | -5.56 | -28.47 |
| 75% | 1.56 | 1.35 | 2.06 | 23.26 | -17.14 |
| max | 3.60 | 4.89 | 8.66 | 286.80 | 54.02 |
The percentage of clusters where the forecast quality increases at the split level: We compare the average forecast quality at the split level to the forecast on the aggregation level of the cluster.
The percentage of "good" forecasts where the quality decreases with less than 20%
The percentage of products where the split forecast is better than a specific forecast
display(mase_variation[good_splits].sort_values(by=["Agg"]).head())
Increased Forecast quality: 55.75% Good Forecast quality: 77.88% Better than detail Forecast quality: 90.27%
| Agg | Split | Forecast | Increase | Bias | |
|---|---|---|---|---|---|
| 18 | 0.25 | 0.53 | 1.00 | 112.72 | -46.38 |
| 91 | 0.35 | 0.57 | 0.91 | 64.06 | -37.26 |
| 99 | 0.42 | 0.46 | 0.71 | 9.42 | -35.83 |
| 7 | 0.42 | 0.57 | 1.52 | 36.52 | -62.34 |
| 68 | 0.48 | 0.74 | 1.11 | 54.20 | -32.78 |
cluster = 99
plt.show()
Cluster 99 split:0.46 , agg:0.42 *** Increase 9.42 %